Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Do CC reaction before largest_acked #2117

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

larseggert
Copy link
Collaborator

Packets are only declared as lost relative to largest_acked. If we hit a PTO while we don't have a largest_acked yet, also do a congestion control reaction (because otherwise none would happen).

Broken out of #1998

Packets are only declared as lost relative to `largest_acked`. If we hit
a PTO while we don't have a largest_acked yet, also do a congestion
control reaction (because otherwise none would happen).

Broken out of mozilla#1998
Copy link

codecov bot commented Sep 16, 2024

Codecov Report

Attention: Patch coverage is 97.91667% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.35%. Comparing base (55e3a93) to head (84a2564).

Files with missing lines Patch % Lines
neqo-transport/src/path.rs 83.33% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2117   +/-   ##
=======================================
  Coverage   95.35%   95.35%           
=======================================
  Files         112      112           
  Lines       36336    36353   +17     
=======================================
+ Hits        34648    34665   +17     
  Misses       1688     1688           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Sep 16, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

github-actions bot commented Sep 16, 2024

Benchmark results

Performance differences relative to 55e3a93.

coalesce_acked_from_zero 1+1 entries: 💚 Performance has improved.
       time:   [99.172 ns 99.508 ns 99.849 ns]
       change: [-12.518% -12.118% -11.754%] (p = 0.00 < 0.05)

Found 14 outliers among 100 measurements (14.00%)
13 (13.00%) high mild
1 (1.00%) high severe

coalesce_acked_from_zero 3+1 entries: 💚 Performance has improved.
       time:   [116.98 ns 117.37 ns 117.77 ns]
       change: [-33.509% -33.205% -32.883%] (p = 0.00 < 0.05)

Found 16 outliers among 100 measurements (16.00%)
1 (1.00%) low mild
1 (1.00%) high mild
14 (14.00%) high severe

coalesce_acked_from_zero 10+1 entries: 💚 Performance has improved.
       time:   [116.41 ns 116.86 ns 117.39 ns]
       change: [-39.701% -34.919% -31.350%] (p = 0.00 < 0.05)

Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
6 (6.00%) high severe

coalesce_acked_from_zero 1000+1 entries: 💚 Performance has improved.
       time:   [97.101 ns 97.244 ns 97.401 ns]
       change: [-32.461% -32.017% -31.593%] (p = 0.00 < 0.05)

Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [111.73 ms 111.87 ms 112.09 ms]
       change: [+0.2883% +0.4298% +0.6314%] (p = 0.00 < 0.05)

Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) low mild
1 (1.00%) high severe

transfer/pacing-false/varying-seeds: No change in performance detected.
       time:   [26.276 ms 27.355 ms 28.450 ms]
       change: [-9.3568% -4.2882% +0.8907%] (p = 0.12 > 0.05)

Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high mild

transfer/pacing-true/varying-seeds: No change in performance detected.
       time:   [34.967 ms 36.613 ms 38.279 ms]
       change: [-10.397% -4.4172% +1.7587%] (p = 0.16 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-false/same-seed: Change within noise threshold.
       time:   [24.989 ms 25.912 ms 26.848 ms]
       change: [-9.6160% -5.2548% -0.8232%] (p = 0.02 < 0.05)

Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
3 (3.00%) high mild

transfer/pacing-true/same-seed: No change in performance detected.
       time:   [40.251 ms 42.323 ms 44.460 ms]
       change: [-9.4232% -3.2176% +3.2552%] (p = 0.33 > 0.05)
1-conn/1-100mb-resp (aka. Download)/client: 💚 Performance has improved.
       time:   [112.15 ms 112.55 ms 112.92 ms]
       thrpt:  [885.59 MiB/s 888.51 MiB/s 891.65 MiB/s]
change:
       time:   [-3.3400% -2.9120% -2.4998%] (p = 0.00 < 0.05)
       thrpt:  [+2.5639% +2.9994% +3.4554%]

Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low severe
5 (5.00%) low mild

1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.
       time:   [313.64 ms 317.26 ms 320.89 ms]
       thrpt:  [31.163 Kelem/s 31.520 Kelem/s 31.884 Kelem/s]
change:
       time:   [-2.6213% -0.9663% +0.6976%] (p = 0.25 > 0.05)
       thrpt:  [-0.6928% +0.9757% +2.6919%]
1-conn/1-1b-resp (aka. HPS)/client: Change within noise threshold.
       time:   [33.964 ms 34.154 ms 34.363 ms]
       thrpt:  [29.101  elem/s 29.279  elem/s 29.443  elem/s]
change:
       time:   [+0.4719% +1.2255% +2.0762%] (p = 0.00 < 0.05)
       thrpt:  [-2.0340% -1.2106% -0.4697%]

Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high severe

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 177.5 ± 95.5 92.5 367.0 1.00
neqo msquic reno on 258.3 ± 65.0 218.0 447.1 1.00
neqo msquic reno 269.2 ± 90.7 218.3 464.8 1.00
neqo msquic cubic on 243.8 ± 64.8 210.5 448.5 1.00
neqo msquic cubic 222.0 ± 9.3 210.1 238.4 1.00
msquic neqo reno on 144.3 ± 89.7 83.6 357.2 1.00
msquic neqo reno 132.1 ± 87.8 82.5 341.6 1.00
msquic neqo cubic on 110.3 ± 62.6 82.1 366.5 1.00
msquic neqo cubic 115.3 ± 62.6 80.4 333.0 1.00
neqo neqo reno on 182.8 ± 82.3 128.6 422.0 1.00
neqo neqo reno 223.6 ± 113.2 127.3 439.3 1.00
neqo neqo cubic on 268.6 ± 142.4 136.8 622.5 1.00
neqo neqo cubic 202.8 ± 73.2 133.7 397.8 1.00

⬇️ Download logs

Copy link

github-actions bot commented Sep 16, 2024

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

@larseggert
Copy link
Collaborator Author

@martinthomson I'd appreciate a review, since the code I am touching is pretty complex.

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure looks fine, but this seems overly conservative to me.

@@ -337,6 +337,41 @@ impl<T: WindowAdjustment> CongestionControl for ClassicCongestionControl<T> {
congestion || persistent_congestion
}

/// Handle a congestion event.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a move? It looks like it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but from ClassicCongestionControl to CongestionControl.

Comment on lines +854 to +856
// Packets are only declared as lost relative to `largest_acked`. If we hit a PTO while
// we don't have a largest_acked yet, also do a congestion control reaction (because
// otherwise none would happen).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that we want to slow down? If we haven't received an ACK, we should be at a low send rate. Do we really want to slow down further?

One thing that concerns me here is that a fast PTO (if we ever enable that) will hit this condition more often and that might not be good for performance. The same goes for long RTT connections, which might be OK while we are still retransmitting within the RTT, but once we get an RTT estimate that is long, we'll slow our send rate by a huge amount for the false PTOs we've hit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the argument is that the combination of initial cwnd and initial RTT (= PTO) were made with some sort of safety criteria in mind. A PTO is an indication that either the RTT is much longer than the default assumption, or there is quite a bit of loss. In either of these two cases, we probably want to slow down?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my main concern is that if we want to be more aggressive about PTO, such as sending another Initial we create a situation where slowing down is not the result of a misunderstanding of the RTT, but a deliberate choice to send more to start with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants